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Abstract 

The declustering problem is to allocate given data on parallel work- 
ing storage devices in such a manner that typical requests find their 
data evenly distributed on the devices. Using deep results from dis- 
crepancy theory, we improve previous work of several authors concern- 
ing range queries to higher-dimensional data. We give a declustering 
scheme with an additive error of Orf(log'^~^ M) independent of the 
data size, where d is the dimension, M the number of storage devices 
and d — 1 does not exceed the smallest prime power in the canonical 
decomposition of M into prime powers. In particular, our schemes 
work for arbitrary M in dimensions two and three. For general d, 
they work for all M > d — 1 that are powers of two. Concerning lower 
bounds, we show that a recent proof of a Od(log~2~ M) bound con- 
tains an error. We close the gap in the proof and thus establish the 
bound. 
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1 Introduction 



The last decade saw dramatic improvements in computer processing speeds 
and storage capacities. Nowadays, the bottleneck in data-intensive appli- 
cations is the time needed to retrieve typically large amounts of data from 
external storage devices. One idea to overcome this obstacle is to distribute 
the data on disks of multi-disk systems so that it can be retrieved in parallel. 
Hopefully, this declustering reduces the retrieval time by a factor equal to the 
number of disks. The data allocation is determined by so-called declustering 
schemes. The schemes should allocate the data in such a manner that typical 
requests find their data evenly distributed on the disks. 

We consider the problem of declustering uniform multi-dimensional data that 
is arranged in a mult i- dimensional grid. There are many data-intensive 
applications that deal with this kind of data, especially multi-dimensional 
databases |CMA+97 . GM9l I.IRR99j . A range query Q requests the data 
blocks that are associated with a hyper-rectangular subspace of the grid. 
Since we will not deal with syntactic issues of queries, we may identify a 
query with the set of requested block. In consequence, \Q\ denotes the num- 
ber of requested blocks. 

The response time of a query Q is (proportional to) the maximum number 
of blocks of Q that are assigned to the same disk (hence we assume identical 
disks) . For an ideal declustering scheme for a system with M disks, this would 
be \Q\/M for all queries Q. As we will see, this aim cannot be achieved. 
The quality of a declustering scheme is measured by the worst case (over 
all queries Q) additive deviation of the response time from the ideal value 
\Q\/M. 

The declustering problem for range queries is an intensively studied problem 
and a number of schemes lOBSOI-il IFAC;AA98I IXPnOl IFB93j have been 

developed in the last twenty years. It was an important turning point when 
discrepancy theory was connected to declustering. 

Before the use of discrepancy theory, no provable performance bounds were 
known for arbitrary dimension d. Such bounds existed only for a few rather 
restricted declustering schemes in two dimensions: For the scheme proposed 
in |CBS03j , a proof for the average performance is given if the number M of 
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disks is a Fibonacci number. For the construction of the scheme in |AP00j . 
M has to be a power of 2. 

A breakthrough was marked by noting that the declustering problem is a 
discrepancy problem. For the case c? = 2, Sinha, Bhatia and Chen |SBC03j 
as well as Anstee, Demetrovics, Katona and Sali |ADKS00j developed declus- 
tering schemes for all M and proved their asymptotically optimal behavior. 
The schemes of Sinha et al. |iSBCQ3] are based on two dimensional low dis- 
crepancy point sets. They also give generalizations to arbitrary dimension 
but without bounds on the error. 

Both papers show a lower bound of VL{\ogM) for the additive error of any 
declustering scheme in dimension two. The result of Anstee et al. ^DKSOO] 
applies to latin square type colorings only, but their proof can easily be 
extended to the general well. Sinha et al. |SBC03j also claim a bound 

d-l 

of f2d(log 2 M) for arbitrary dimension but their proof contains an error, 
that is critical for c? > 3 (cf. Sectional). 

The first non-trivial upper bounds for declustering schemes in arbitrary di- 
mension were proposed by Chen and Cheng |CC02j . who present two schemes 
for the (i-dimensional declustering problem. The first one has an additive 
error of 0(i(log'^~^ M), but works only if M = p'^ for some G N and p is a 
prime such that d < p. The second one works for arbitrary M, but the error 
increases with the size of the data. (Note that all other bounds stated in this 
paper are independent of the data size.) 

Our Results: We work both on upper and lower bounds. For the up- 
per bound, we present an improved scheme that yields an additive error of 
OdO-Og'^~^ M) for all values of M (independent of the data size) and all d such 
that d < qi + 1, where qi is the smallest factor in the canonical decomposi- 
tion of M into prime powers. This compares to the current best declustering 
scheme (with worst case additive error independent of the data size) due to 
Chen and Cheng |CC02j as follows. Its worst case additive error is of the 
same order of magnitude as ours, but has stronger restrictions on M. It works 
only if M = p*^ is a power of a prime and if d < p. Note that our scheme in 
the case M = p^ requires only d < p^ + 1. Thus, in particular, for M being a 
power of two our scheme can be used in every dimension d < M + 1, whereas 
the scheme in |CC02j only works for dimension d = 2. This and the fact that 
our scheme can be used for all M in dimension 2 and 3, is useful from the 
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viewpoint of application. After preparation of this paper and its conference 
version |DHW04] . the journal version |CC04j of Chen and Cheng |CC02j was 
pubhshed. There, the quite strict limitations of |CC02j could be relaxed to 
the ones obtained in this paper. 

We also show that the latin hypercube construction used by Chen and 
Cheng |CC02t ICC04j and in our work is much better than proven there. 
Where they show that the final scheme has an error of at most 2"^ times the 
one of the latin hypercube coloring, we show that both errors are the same. 

d-l 

For the lower bound, we present the first correct proof of the f2rf(log 2 M) 
bound for dimension d > 3 and the Q{\ogM) bound for dimension d = 
2. This is particularly interesting with regard to a recent result of Che- 
did |Che04j . There a declustering scheme is presented that works for 2^* 
[t G N) disks in dimension d = 2. It is claimed that it has an additive error 
of at most 3. 



2 Discrepancy Theory 

In this section, we sketch the connection between the declustering problem 
and discrepancy theory. 



2.1 Combinatorial Discrepancy 

Recall that the declustering problem is to assign data blocks from a multi- 
dimensional grid to M storage devices (disks) in a balanced manner. The 
aim is that range queries use all storage devices in a similar amount. More 
precisely, our grid isV = [rii] x ■ ■ ■ x [rid] for some positive integers ni, . . . , n^.^ 
A query Q requests the data assigned to a rectangle (or box) [xi..yi] x ■ ■ ■ x 
[xd--yd] for some integers 1 < Xj < ?/j < n^. We identify a query with the set 
of blocks it requests, i.e., Q = [xi..yi] x ■ ■ ■ x [xd--yd]- 

^We use the notations [n] :— {1,2, ... ,n} and [n..m] :— {k ^ N \ n < k < m} for 
n, m e N, n < m. 
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We assume that the time to process a query is proportional to the maximum 
number of requested data blocks that are stored on a single device. We 
represent the assignment of data blocks to devices through a mapping x '■ 
V —>■ [M]. The processing time of the query Q then is maXi^^M] \x~^{'i) ^ Q\j 
where, as usual, = {v E V : xi"^) = 0- Clearly, no declustering 

scheme can do better than \Q\/M. Hence a natural performance measure 
is the additive deviation from this lower bound. We are interested in the 
worst-case behavior. Thus we are looking for declustering schemes such that 
maxg maxig[Af] ^ Q\ is small. 

This makes the problem a combinatorial discrepancy problem in M colors. 
Denote by £ the set of all rectangles in V. Then H = (V, S) is a hypergraph. 
For a coloring x '■ V ^ [M], the discrepancy of a hyperedge E E £ with 
respect to x is 

disciE, y) := max I Iy^^H) H E\ — 4t\E\ \ , 

the discrepancy of H with respect to x is 

discf?-^, y) := max \\Y^^(i) (1 E\ — 4r\E\\ , 

^ ' ig[Af],£;G£- ' A/Ill' 

and the discrepancy of 7i in M colors is 

disc(?-^, M) := min disc (7-^, x). 

These definitions were introduced by Srivastav and the first author in |DS99| 
IDS03j extending the well-known notion of combinatorial discre pancy to any 



number of colors. Similar notions were used by Biedl et al. [B CC"'"02 and 



Babai, Hayes and Kimmel [BHKOlj . For our purposes, only positive devia- 
tions have to be regarded ("too many blocks on one disk"). We adapt the 
multi-color discrepancy notion in the obvious way and define the positive 
discrepancy by 

disc+(7^,x) := m^x {\x-\i)^E\-^^\E\), 

disc+(H,M) := min disc+(H,x)- 

Clearly, we have j^—^ disciTi) < disc~''(7i) < disc(7i) for all hypergraphs Ti. 
The first inequality follows from the fact that for every E E S and every 
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coloring x--V^[M]we have E,6[a/](Ix"Hj) ^E\- = \E\ - \E\ = 0. 

Summarizing the discussion above, we have the following. 

Theorem 1. The additive error of an optimal declustering scheme for range 
queries is disc^(7i, M). 

Since a central result of this paper are discrepancy bounds independent of 
the size of the grid, we usually work with the hypergraph = {\NY,£fj), 
— \JXi=i[^i--yi\ I 1 ^ < ?/j < A^} for some sufficiently large integer 
A^. Furthermore, we regard only the case that M > 3. For M = 2, a 
checkerboard coloring yields a declustering scheme with an additive error of 
1/2. We prove the following result. 

Theorem 2. Let M > 3 and d > 2 be integers and qi the smallest prime 
power in the canonical factorization of M into prime powers. Then 

Od{log'^~^ M) for d < qi + 1, independent of N eN, 

fid(log'^ M) for N>M, 
e(logM) ford = 2. 

2.2 Geometric Discrepancy 

As mentioned before, the use of geometric discrepancies in the analysis of 
declustering problems in |SBC03t lADKSOO] was a major breakthrough in 
this area. We refer to the recent book of Matousek |Mat99j for both a great 
introduction and a thorough treatment of geometric discrepancies. 

The geometric discrepancy problem is to distribute n points evenly in a 
geometric setting. For our purposes, we regard discrepancies of point sets 
in [0, 1]'^ with respect to axis-parallel boxes. Such a box R is the product 
R = Y[i=i[^i^ Vi) with < < ?/i < 1 for all i E [d]. Our aim is that each box 
R shall contain approximately n vol(-R) points, where vol(i?) = ni=i iVi ~ ^i) 
denotes the volume of R. Again, discrepancy quantifies the distance to a 
perfect distribution. The discrepancy of an ra-point set V with respect to a 
box R is defined by 

D{V,R) = \\VnR\ -nvol(i?)|. 



(ii) disc+(7^^,M) = 
(ill) disc+{nir,M) = 
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the discrepancy of V with respect to the set IZd of all axis-parallel boxes is 

D{V,na) = sup \D{V,R)l 

and the discrepancy of TZd for n-point sets is 

D{n,nd)= inf D{V,nd). 

7'C[0,l)'^ 
\V\=n 

3 The Lower Bound 

To prove our lower bounds, we use classical lower bounds for geometric dis- 
crepancies. Roth's |Rot54j famous lower bound for the L2 discrepancy of the 
axis-parallel boxes immediately implies the following. 

Theorem 3 (Roth's lower bound). Let d > 2. There exists a constant 
k > (depending on d) such that for any n-point set V in the unit cube 
[0, 1)'^, there is an axis-parallel box R in [0, 1)'^ with 

It was Schmidt |Sch72j who came up with the sharp lower bound in two 
dimensions. 

Theorem 4 (Schmidt's lower bound). There is a constant k > such 
that for any n-point set V in the unit square [0, 1)^, there is an axis-parallel 
rectangle R in [0, 1)^ with 

D{V,R) > klogn. 



The general idea in the proofs of the lower bound for declustering schemes 
in Sinha et al. |SB(]n8j and Anstee et al. |ADKSnnj (for d = 2 only) is the 
following. 

Any low-discrepancy M-coloring of [M^ has color classes of approximately 
M"^"^ vertices. By scaling, such a color class yields an ikf^^^-point set V in 
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[0, 1)'^. The lower bounds above give a box R with poly logarithmic discrep- 
ancy. Round -R to a box R with corners in {0, -p, ■ ■ ■ , ^^jj^, l}'' in such a 
way that R (1 V = R (1 V. Then R and R have similar volume and hence 
similar discrepancy. Rescaling R yields a hyperedge R with combinatorial 
discrepancy equal to the geometric one of R. 

The small, but crucial mistake in the proof of Sinha et al. |SBC03j is hidden 
in the transfer from the geometric discrepancy setting back to the combina- 
torial one. Unlike in dimension d = 2, rounding R to R does not yield a 
constant change in the discrepancy in higher dimensions. The volume differ- 
ence I vol(-R) — vol(^)| is still Orf(-g). However, since the number of points 
is M'^~^, the change in the discrepancy can be of order 0d(M'^~^). This is 
way too large ior d > 2. 

For this reason, a straight generalization of the proof of Anstee et 
al. |ADKSnn| of the lower bound in two dimensions (as attempted in |SB(]03j ) 
is not possible. We solve this problem in the following way. Instead of look- 
ing at the whole [Mj'^-grid, we focus on a small subgrid. This reduces the 
number of points, and hence the change in the discrepancy. 

Here is an outline of the proof: Starting with an M-coloring of the [M]'^- 
grid we have to show the existence of a box with positive discrepancy of 
order f2rf(log^~ M). We restrict the search to a small subgrid [sM]'^ (with 
s a multiple of -g) to avoid the above mentioned problems in the rounding 
process. The left part of Figure Q depicts such an [sM] '^-subgrid. The crosses 
represent one color class. We choose this color class in such a way that it 
contains at least the average number of s'^M'^~^ vertices of [sM]"^. From 
this color class we get a set of points in the [0, Ij'^-cube by scaling. This 
can be seen in the middle part of Figure ^ Using the Theorem of Schmidt 
respectively Roth, we find a box R (in the middle section of Figure with 
large geometric discrepancy. We round this box to a box R containing the 
same points as R but fitting to the grid lines stemming from the [sM]'^-grid. 
For the corresponding box R (the box with the continuous lines in the right 
section in Figure^ in the [sMj'^-grid we estimate the discrepancy using the 
geometric discrepancy of the box R and the relatively small change in the 
discrepancy caused by the volume difference between the boxes R and R. 
Should this large discrepancy be caused by a lack of vertices in one color, 
we get a lower bound for the positive discrepancy through the following 
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observation. Either R or its complement in [sM]*^ has a positive discrepancy 
of the wanted order. Although the complement is not a box, it is the union of 
at most 2d boxes. Thus, at least one of these boxes has a positive discrepancy 
of order ^^(log" M). 




Figure 1: Construction of a box with large discrepancy. 



Proof of Theorem{B (^%)- Clearly disc+(7^^, M) > disc+(Htj, M) for all N > 
M. Hence, we can assume = M. The proof is organized as follows. We 
first show a lower bound for the M-color discrepancy of From this, we 
derive a lower bound for the positive M-color discrepancy of Tiff. 

Let X '■ [MY — > [M] be an M-coloring of Hfj. Choose an s e 

d-2 d-2 N r -, 1 

[M d-i^2M d-i) n [0,1] such that s is a multiple of jj. Such an s ex- 
ists since M ^-i > -g. Without loss of generality, we may assume that 

Claim. There is a box R C [sM]*^ such that 

= (log^^ m) . 



x-\i) n R\ - jj\R 



U n > s'^M'^-^ + I (^) ' log~M, we clearly have \n-s'^M'^-^\ > 

, , , , '^-1 d~l 

f (dTi) ^ log ^ Therefore, we may assume 

gd^d-i ^ ^ ^ s'^M^-^ + I (^) log^ M. (1) 
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For every vertex z = {zi, Z2, ■ ■ ■ , Zd) G x (1) l~l [sM] we define x 

( 2zi-l 2z2-\ 2zrf-: 
2sM ' 2sM ' • • • ' 2sM 



2^ 2^^ _ Let V := {x, \ z ^ n {sMf}. Then P is 



an n-point set in the unit cube [0, 1)'^. Estimating the cardinahty of P, we 

, , , (d-2)d , , -(d^-2d) + (d-l)^ 1 

get n > s'^M'^-^ > M'^^Af^^^ = M s^i = M^. By Theo- 
rem El there exists a box R = Y[t=i[^i^ Ui) [0; with 



d-l 
2 



-1 



\\RnV\-nvo\{R)\>k\og~n>k\^ — -j log~ M. (2) 

Now we construct a box R = nf=i[^«'l/i) ^y rounding the Xi and yi to the 
nearest multiple of In case of ties, we round down. This ensures VCiR = 
VnRas the following argument shows. Let (^^, . • • , eVHR. 

This is equivalent to Xi < < Vi for all i G [rf]. But this holds if and 

only if we have Xi < and > ^ for all i E [d], which is equivalent to 



^ M "^^^ — sM 

2sM ' 2sM ' • • • ' 2sM 



' 2zi-l 222-1 2£d-iy -p PI 



We now quantify the effect of this rounding. The symmetric difference of R 
and R is the union of 2d boxes such that all their side lengths are at most 1 
and one side length of each box is bounded from above by 2^ (this is due 
to the rounding process). Hence | vol(-R) — vol(-R)| < 2d^j^ = Using 



d-2 



s < 2M d-i ^ we get 

s°'M"'-i I vol(i?) - vol(i?)| < ^s'^M'^-i = ds'^-^M'^-^ < d2'^'\ (3) 

an estimation needed below. Note that the choice of s being small ensures 
that the effect of rounding is independent of M. 

The combinatorial counterpart of R is the box 

^-{^e[Mn(t-i,...,i-i)Gi?}. 

Hence, 

\x'\i)nR\ = \vnR\ = \vnR\. 

One also easily verifies that |-R| = s'^M'^ vol(-R). By construction. 



\x-\i)nR\-jj\R\ 



= \\VnR\- s'^M'^-^vo\{R)\ 

= \\VnR\-n Yo\{R) + {n- s'^M'^'^) vol(i?) 

+s'^M'^'^ (vol(i?) - vol(i?)) I 
> \\VnR\-nYo\{R)\-\n- s'^M^-^\ 

-sHI'^-^\yo\{R)-vo\{R)\. 
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Observe that we bound the M-color discrepancy of Ti^^ from below by the 
geometric discrepancy of the box R minus two terms, comprising the fact 
that n is not exactly s'^M'^~^ and the effect of rounding R to R. Hence 
by (HD, (ED and ©, 

|x-^(i)n^|--^|i?| 

> A;(^)'^log'^M- I (^)'^log^M-c/2'^-^ 
= f]d(log^M). 



Thus, we have shown the claimed existence of a box R C [sM]'^ with 
n -R| — j^\R\ = i^d(log~ M). It remains to prove that this bound 

also holds for the positive discrepancy. To this end, let us assume that the 
discrepancy of the box R in color 1 is caused by a lack of vertices in color 
1. Since ^ [sMY\ > s'^M'^'^, the complement of R in [sMY has at 

least the same discrepancy as i?, but caused by an excess of vertices in color 
1. Though this complement is not a box, it is the union of at most 2d boxes. 
Therefore, one of these boxes has a positive discrepancy that is at least ^ 
times the discrepancy of R in color 1. □ 



This last argument increases the implicit constant of the lower bound by a 
factor of 1^ compared to the approach of Sinha et al. |SB(]n8j . 

We briefly show how to use the above to prove the f2(logM) bound for 
dimension d = 2. For this bound, two not completely satisfying bounds exist. 
Anstee et al. |ADKSOO] only treated latin square type colorings of [M]^ and 



posed it an open problem to extend their result to arbitrary colorings. The 
proof in [SBCOSj does not have this restriction, but is not very precise, which 
in particular helped to hide the error for d > 2. 

As a simple and clean proof we therefore propose the following: Use the same 
reasoning as in the case of arbitrary dimension d > 2, but apply Schmidt's 
lower bound instead of Roth's. The parameter s can be choosen as 1. In 
dimension c? = 2 we do not need small boxes, because the roundoff error has 
an effect on the discrepancy which is of order 0(1). 
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4 The Upper Bound 



In this section, we present a declustering scheme showing our upper bound. 
As in previous work, we use low discrepancy point sets to construct the 
declustering scheme. In the following we use the notation of Niederreiter 
|JNie87j . For an integer 6 > 2, an elementary interval in base b is an interval 
of the form E = Y[t=i [^ib'^', {^i + , with integers c?j > and < 

Qi < b'^^ for 1 < i < d. For integers t, m such that < t < m, a (t, m, d)-net 
in base 6 is a point set of 6™" points in [0, 1)"^ such that all elementary intervals 
with volume ^^-"^ contain exactly fe* points. 

Note that any elementary interval with volume ft*""* has discrepancy zero in 
a (t, m, (i)-net. Since any subset of an elementary interval of volume has 
discrepancy at most 6* and any box can be packed with elementary intervals 
in a way that the uncovered part can be covered by Orf(log'^~^ h^) elementary 
intervals of volume 6*""^, the following is immediate: 

Theorem 5. A {t,m,d)-net Vnet in base b with n = b"^ points has discrep- 
ancy 

D{VneM = Od{\og''-^n). 

The central argument in our proof of the upper bound is the following result 
of Niederreiter |Nie87j on the existence of (0, m, (i)-nets. From the view-point 
of application it is important that his proof is constructive. Admittedly, this 
construction is highly involved. We refer to the book of Niederreiter |Nie87j 
for the details. 

Theorem 6. Let b > 2 be an arbitrary base and b = qiq2 ■ ■ - Qu be the canon- 
ical factorization of b into prime powers such that qi < ■ ■ ■ < q^. Then for 
any m> and d < qi -\- 1 there exists a (0, m, d)-net in base b. 

We use (0, m, (i)-nets to construct an M-coloring of Tij^j in Lemma [7| For 
the definition of these colorings, we need the following special elements of 
E'lf A set 11^=1 -^i ^ ^ti is called a row of [MY if there is an i G [d] with 
li = [1..M] and \Ij\ = 1 for all j ^ i. In Lemma |H1 we use the M-coloring of 
TCIj to construct an M-coloring of TCff with same discrepancy. 
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Lemma 7. Let Vnet be a {0,d— 1, d)-net in base M in [0, 1)"^. Then there is 
an M -coloring xm of Tijyj = {[M]'^,Sf^) such that all rows of [M^ contain 
every color exactly once^ and 

diSc(ni„XM) < D(Vnet,nd). 



Proof. The net Vnet consists of M"^ ^ points and all elementary intervals with 
volume M~'^^^ contain exactly one point. In particular, all subsets 11^=1 
of [0, 1)*^ such that there is an i G [d] with Jj = [0, 1) and for all j i there 
exist Gj e [0..M — 1] with Ij = ^^^^), contain exactly one point. 

We construct a coloring xm of Hpj = {[M]'^,Sf,j) corresponding to the set 
Vnet- Let :P:={a;e[M]<^|p„etnnti[n^'i) Then each row of 

[MY contains exactly one point of V. We define the coloring xm '■ [M]'' — > 
[M] by XM{y, X2, ■■■,Xd) = i for all x = {xj, X2, ■■■,Xd) E V, i,y E [M] 
such that y = Xi + {i — 1) mod M. Hence V receives color 1, color class 2 
is obtained from shifting V along the first coordinate and so on. This defines 
an M-coloring xm of Hm — {[M^, Ef^). Since each color class is constructed 
by shifting the first color class, each row of Ti^f^ contains every color exactly 
once. Thus, each whole row of Tij^ has discrepancy zero. 

For this coloring it is sufficient to calculate max^^^d^ \Xm{^) ^ R\ — 

because for each color i E [M] and each box R E E'lj we get the same 
discrepancy for the box R' , which is a copy of R shifted along the first 
dimension by i — 1 and wrapped around perhaps, with respect to the color 
1. If i?' is wrapped around, it is the union of two boxes. Since whole rows 
have discrepancy zero, the discrepancy of those boxes is the same as the 
discrepancy of the box between them, and we have 



d\sc{HM,XM) = max 



\Vf\R\-ji\R\ 



Let R = Y[i=i[^i--yi] arbitrary hyperedge of Ti-fj. The associated box in 
[0, 1)'^ isR = nti f )• Then \Vr]R\ = \VnetnR\ and \R\ = M"^ vol(i?). 

^Some authors call this a permutation scheme for [M]'^ 
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Thus the combinatorial discrepancy of R equals the geometric one of R. We 
have 

Vnet n i?| - M'^-^ yo\{R) I < D{Vnet, 

Hence we get disc(7itf, Xm) < D{Vneu T^d)- □ 

In the previous lemma we constructed an M-coloring for the [Mj'^-grid with 
low discrepancy. Wc now extend this coloring to [A^J'^-grids for arbitrary 
G N. We do this by plastering the [iVj'^-grid with copies of the [Mj'^-grid 
coloring. 

Lemma 8. Let xm be an M-coloring of H.'^j such that all rows of 
[MY contain every color exactly once and x o, coloring of Tiff defined by 
X{xi, ...,Xd) = XMivi, ■■■,yd) with Xi = yi mod M for i e [d], Xi e [N], 
yi e [M]. Then 

discCH^, x) = disc{nM, xm)- 

Proof. The proof is organized in the following way. Pick an arbitrary box in 
the [A^j'^-grid. Using the fact that whole rows in the [M]'^-grid coloring xm 
have discrepancy zero, we can ignore all of the box except its corners. By 
construction, these corners can all be found in one common [Mj'^-subgrid. 
Since whole rows (in the [Mj'^-grid coloring xm) have discrepancy zero, tak- 
ing complements in each dimension does not alter the discrepancy. We thus 
obtain a box in the [Mj'^-grid that has the same discrepancy as the original 
box. 

Let R = Y['i=i{^i--yi\ be an arbitrary hyperedge of Hf^. For all i G [d\ 
there exist unique Xi,yi G [M] with Xi = Xi (mod M) respectively yi = 
yi (mod M). If Xi < y^, we set Xi :— Xi and yi :— yi. Otherwise we set 
Xi :—yi + l and yi :— Xi — 1. We define the rectangles 

Ri := [xi..M] X [X2..y2] x . . . x [xd..yd\, 
Rr := [l..yi] X [x2..y2] x ... x [xd..yd], 
Ro := [xi..yi\ x [0:2. .2/2] x . . . x [xd..yd\- 

Using the fact that whole rows have discrepancy zero and the fact, that the 
coloring x is invariant under shifts with multiples of M in any dimension, we 



XM(i)ni?|-^|i? 
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get for all i e [M] 



\Rf^x'\i)\-ii\R 



Af - n x^'WI - ij\Ri\ + \Rr n x''i^)\ - 

\Ronx-\i)\-j!m 



Thus, disc(i?, x) = disc(_Ro5X)- Applying this successively in every coordi- 
nate, we get 

d d 
disc(^,x) = disc(JJ[xi..yi],x) = disc( xm)- 



i=l 



This completes the proof. 



□ 



Lemma IHl is a remarkable improvement of Theorem 4.2 in |CC02j . where 
disc(7i5v5 x) < disc(7i5[f , Xm) is shown. Note that this reduces the imphcit 
constant in the upper bound by factor of 2'^. 

It remains to show that the upper bound in Theorem El follows from LemmaEI 
and Lemma IHl 



Proof of Theorem\^i). Let M > 3 and d > 2 be positive integers and d < 
gi + 1, where qi is the smallest prime power in the canonical factorization 
of M into prime powers. Theorem (HI provides a (0, (i — 1, (i)-net Vnet in base 
M in [0, l)"^. Using Lemma [3 , we get an M-coloring xm of such that 
all rows contain each color exactly once and (\Ssc{7i%j , xm) < D(Vnet,T^d)- 
With Lemma IHl and Theorem we have disc(?i^,M) < D{Vnet,'J^d) = 



5 Conclusion 

We gave lower and upper bounds for the declustering problem of range queries 
to higher- dimensional grids. This paper contains the first complete proof of 

d-l 

the lower bound ^^(log 2 M) for arbitrary values of M and d. 

We proposed a declustering scheme that has an additive error of Orf(log'^~^ M) 
with the sole condition that d < qi + 1, where qi is the smallest prime power 
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in the canonical factorization of M into prime powers. This improves the 
former best declustering schemes of Chen and Cheng |CC02j . where either 
bounds depend on the data size A^*^ or M = and p > d was required for 
some prime p and t E N. Furthermore, Lemma |H1 improves the analysis of 
Chen and Cheng |CC02t .CC04] of the discrepancy of latin square colorings 
by a factor of 2^'^. 

The natural problem arising from this work is to close the gap between the 
lower and upper bound. However, this is probably a very hard one. The 
reason is that the corresponding problem for geometric discrepancies of boxes 
is extremely difficult. Closing the gap between the r2rf(log^~ n) lower and the 
OdO-Og'^~^ n) upper bound for D{n, TZd) was baptized 'the great open problem' 
already in Beck and Chen jBC87j . Since then no further progress has been 
made for the general problem. Note that in the proof of a slight improvement 
due to Baker |Bak99j recently a serious error was found, so that the result 
was withdrawn by the author [reported by Jozsef Beck, Oberwolfach Seminar 
on Discrepancy Theory and Applications, March 2004]. 
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